Intelligent Selection of Language Model Training Data
نویسندگان
چکیده
We address the problem of selecting nondomain-specific language model training data to build auxiliary language models for use in tasks such as machine translation. Our approach is based on comparing the cross-entropy, according to domainspecific and non-domain-specifc language models, for each sentence of the text source used to produce the latter language model. We show that this produces better language models, trained on less data, than both random data selection and two other previously proposed methods.
منابع مشابه
Do the Emotionally More Intelligent Gain More from Metacognitive Writing Strategy Training?
Though privileges ascribed to various facets of language learning strategy training have long been espoused with regard to varied language skills and components, the role some individual variables such as emotional intelligence might play in this respect seems to have received very scant attention. The researchers in the current study embarked on a probe into the impact of metacognitive strateg...
متن کاملMLIFT: Enhancing Multi-label Classifier with Ensemble Feature Selection
Multi-label classification has gained significant attention during recent years, due to the increasing number of modern applications associated with multi-label data. Despite its short life, different approaches have been presented to solve the task of multi-label classification. LIFT is a multi-label classifier which utilizes a new strategy to multi-label learning by leveraging label-specific ...
متن کاملA Phoneme-Based Student Model for Adaptive Spelling Training
We present a novel phoneme-based student model for spelling training. Our model is data driven, adapts to the user and provides information for, e.g., optimal word selection. We describe spelling errors using a set of features accounting for phonemic, capitalization, typo, and other error categories. We compute the influence of individual features on the error expectation values based on previo...
متن کاملA hybrid CS-SA intelligent approach to solve uncertain dynamic facility layout problems considering dependency of demands
This paper aims at proposing a quadratic assignment-based mathematical model to deal with the stochastic dynamic facility layout problem. In this problem, product demands are assumed to be dependent normally distributed random variables with known probability density function and covariance that change from period to period at random. To solve the proposed model, a novel hybrid intelligent algo...
متن کاملDesign and Implementation of an Intelligent Part of Speech Generator
The aim of this paper is to report on an attempt to design and implement an intelligent system capable of generating the correct part of speech for a given sentence while the sentence is totally new to the system and not stored in any database available to the system. It follows the same steps a normal individual does to provide the correct parts of speech using a natural language processor. It...
متن کامل